Time-division multiplexing circuit-switching router

ABSTRACT

A time-division multiplexing circuit-switching router comprises a plurality of input means (i 1 , . . . i N ), at least one output means (o 1 , . . . , o M ), switching means for switching between said input means (i 1 , . . . , i N ) and said output means (o 1 , . . . , o M ) and for connecting a selected input means to output means during a predetermined time slot, and a router table means for controlling said switching means, said router table means including instructions which input means be connected to output means for a predetermined time slot. Said router table means is divided into a plurality of tables, each table having a weight which specifies the amount of bandwidth per reservation in one table in relation to a reservation in the other table(s).

FIELD OF THE INVENTION

The present invention relates to a time-division multiplexingcircuit-switching router, comprising a plurality of input means, atleast one output means, switching means for switching between the inputmeans and the output means and for connecting a selected input means tooutput means during a predetermined time slot, and a router table meansfor controlling said switching means, said router table means includinginstructions which input means be connected to output means for apredetermined time slot

BACKGROUND OF THE INVENTION

To realize precision in latency and throughput for communication overshared interconnection, conventional communication architectures relytypically on the arbitration scheme called time-multiplexed multipleaccess (TDMA). An arbitration scheme does contention resolution and isessential in case of communication over shared interconnect lines. TDMAworks like a time wheel (of slots) where each slot can be staticallyreserved for a unique master. If the time wheel consists of S slots andeach slot takes an equal amount of time, then every slot reservationcorresponds with 1/Sth of the available bandwidth B of the bus. Multipleslots have to be reserved for connections, which need more bandwidththan B/S. The slot reservations are stored in a table, which istypically implemented by an embedded memory like e.g. a random accessmemory (RAM) or a first-in-first-out (FIVO) buffer.

A problem arises when the range of bandwidth requirements of theprogrammed connections is large (e.g. 1 Mb/s to 20 Gb/s). Then eithermany slots (>20000 for the given example) in the time wheel or somethingelse are needed to realize a large ratio with less than 20000 slots.

Managing the complexity of designing chips containing billions oftransistors requires decoupling computation from communication. Forcommunication, scalable and compositional interconnects, such asnetworks on chip (NoC), must be used. So, the future of on-chipcommunication is an on-chip network of routers. Circuit-switching allowsto establish connection over a conceptual physical path from a source toa destination. An on-chip router network consists, among other parts, ofinterconnected routers.

U.S. Pat. No. 4,466,060 A discloses an adaptive distributed messagerouting algorithm for controlling the routing of data messages in apacket message switching digital computer network. Network topologyinformation is exchanged only between neighbour nodes in the form ofminimum spanning trees, referred to as exclusionary trees.

An exclusionary tree is formed by excluding the neighbour node and itslinks from the tree. From the set of exclusionary trees received a routetable and transmitted exclusionary trees are constructed.

WO 01/89158 A1 discloses a method for controlling resources in acommunication network comprising nodes interconnected by links, eachcarrying a bitstream which is divided into frames, each frame in turnbeing divided into time slots which are allocatable to formcircuit-switched channels. Resources in the form of write access to timeslots are associated with administrative entities. Allocation ofresources is then done in such a way the allocation of resources tochannels pertaining to a subject administrative entity is guaranteed tothe extent by which resources have been associated with the subjectadministrative entity.

In an on-chip router network using time-division multiplexing (TDM),physical links can be shared to achieve a higher utilization of theinterconnect resources. This requires control to set a switch inside therouter and this control information is stored in a so-called slot, i.e.a predetermined unit of time, or router table.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a time-divisionmultiplexing circuit-switching router which is able to be used in anon-chip router network under reduced costs.

In order to achieve the above and further objects, there is provided atime-division multiplexing circuit-switching router, comprising aplurality of input means, at least one output means, switching means forswitching between said input means and said output means and forconnecting a selected input means to a selected output means during apredetermined time slot, and a router table means for controlling saidswitching means, said router table means including instructions whichinput means be connected to output means for a predetermined time slot,characterized in that said router table means is divided into aplurality of tables, each table having a weight which specifies theamount of bandwidth per reservation in one table in relation to areservation in the other table(s).

Due to the invention the size of the router table means is reducedresulting in a reduction of the corresponding silicon area and overheadand, thus, in a saving of costs which is important for the provision ofan on-chip router network. Further, the invention allows for a finerbandwidth granularity for the same size of the router table means and,thus, the same costs resulting in more efficient use of the availablebandwidth in the network, since high bandwidth data streams can becovered by a higher weighted table such that less time slots need to beallocated. The invention can be used in all digital system-on-chip ICs.

Preferably, the weights of the tables are programmable.

Each table can include a number (S_(l)) of rows, and per predeterminedtime period the tables are cycled a number (w_(l)) of timescorresponding to the respective weight (w_(l)≧1), so that preferably theeffective slot cycle period (S_(e)) isLS _(e) =Σw _(l) ·S _(l)l=1

The way in which entries of the tables are enumerated depends on thelatency requirements through a network the router is connected to.

In a further preferred embodiment comprising a plurality of buffermeans, each connected between an input means and the switching means,respectively, each buffer means comprises a plurality of buffer portionscorresponding to the plurality of tables, each buffer portion beingallocated to a table, respectively, wherein the router table means isprovided for controlling the buffer portions in accordance with thetables. Such a buffering concept is more elegant than a shared bufferingconcept, since the incoming flow control digits are stored in suchbuffer means per table so that the various levels of the TDMA schedulebecome logically independent. Preferably, said buffer means is afirst-in-first-out (FIFO) buffer means.

The above described objects and other aspects of the present inventionwill be better understood by the following description and theaccompanying Figures.

In the following a preferred embodiment of the present invention isdescribed with reference to the drawings in which

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic basic block diagram of a time-divisionmultiplexed circuit-switching router;

FIG. 2 schematically shows a combination of two routers connected inseries and the flow of four guaranteed throughput data streams;

FIG. 3 schematically shows an example of a simple router network withtwo 2×2-routers and the flow of three data streams, two beingbest-effort and one being guaranteed-throughput;

FIG. 4 shows a schematic block diagram of a time-division multiplexedcircuit-switching router including a multi-layer router table accordingto a preferred embodiment of the invention;

FIG. 5 a schematic diagram of the flow of three data streams, whichpropagate through a network consisting of two routers according to apreferred embodiment of the invention; and

FIG. 6 shows a schematic block diagram of a plurality of buffers whichare included in the router of FIG. 4 per input.

DESCRIPTION OF A PREFERRED EMBODIMENT

The architecture of a simple router for circuit-switching is depicted inFIG. 1 for explanation purposes. The router consists of N input portsincluding buffers, M output ports and a switch to forward data from theinputs to the outputs (concurrently) according to a router table.Circuit-switching allows to establish connections over a physical pathfrom a source to a destination for a certain amount of time (Leijten, J.A. J.; van Meerbergen, J. L.; Timmer, A. H.; Jess, J. A. G.; “Streamcommunication between real-time tasks in a high-performancemultiprocessor”, Design, Automation and Test in Europe, 1998,Proceedings, 23-26 Feb. 1998, page 125-131).

In the routers the data is for a certain amount of time stored in queuesbecause of timing implementation reasons. Consequently,circuit-switching over a router network differs from a shared bus TDMAarchitecture in that the data transport over the network involvesmultiple hops (one for each router on the path) instead of only one,wherein each hop (router) has a different router table. Furthermorecircuit-switching is a special form of TDMA where by master-slave, or inthe context of routers input-output port, pairs are scheduled asexplained below.

The router table of an individual router contains the information toprogram a crossbar switch in a contention free manner over time. Forthis reason, time is divided into fixed units of time called slots.During a slot, a unit of data called a flit (flow control digit) can beforwarded by the crossbar switch from a router input-buffer to anoutput. The input/output mapping in a specific slot is specified by therouter table T, being a matrix of size S×M, where S is the number ofslot entries and M is the number of output terminals of the router. Theelements of T are in the set {Ø, 1, . . . , N}. The value n=T(s, m),with 0≦s≦S and 0<m≦M, means that in slot s, if n≠Ø, a flit is forwardedfrom input i_(n) to output o_(m). So, row s of T specifies the mappingin slot s. The slot assignment T is periodically repeated over timeaccording to s=k mod S, with k being a slot iterator.

Accordingly, the router table of every router in the network has S timeslots. There is a logical notion of synchronicity: All routers in anetwork are in the same fixed-duration slot, as already mentionedbefore. In a slot iteration k, at most one block of data is written peroutput port. The outputs of the routers in a network are connected toinputs of routers by means of links between input/output pairs. Such alink causes a block that is being written to an output in slot iterationk to be present in the queue of an input that is connected via a link,at the next slot iteration. During the next slot k+1 or later, thearrived blocks are again written to their appropriate output ports. Theblocks thus propagate in a store and forward fashion. The latency ablock incurs per router is equal to the duration of a slot multiplied bythe difference in the arrival and departure time of the block (which isgiven by the reservations of two subsequent routers along the path). Thebandwidth is guaranteed in multiples of block size per S slots.

The slots reserved for a path from a source to a destination increase atleast by one (modulo S) per router. If slot s is reserved in some routeron the path and slot (s+q)%S, with q>0, is reserved in the next routeron the path, the incurred latency for this part of path is q slots.

The order in which blocks at an input of a router arrive must be thesame as the order in which these blocks are being written through one ofthe outputs of the router. This allows implementing the queues connectedto the inputs by means of FIFOs.

The entries of the router table map outputs to inputs for every slot,i.e. T(s, o)=i. An entry is empty, when there is no reservation for thatoutput in that slot. No contention arises because there is at most oneinput per output. Sending a single input to multiple outputs (multicast)is possible.

In a GT (Guaranteed-Throughput) routing approach, every GT token, whichis read in time slot s in some router, is read in time slot (s+q)%S inthe next router in the path the token follows. The value of q is atleast one and is a result of the chosen schedule. It is preferably assmall as possible since the overall latency of connection is equal tothe sum of all q's along the path. Guaranteed-throughput (GT) servicesrequire resource reservation for worst-case scenarios, which can beexpensive.

An example of a simple router network including two 2×2-routers R1 andR2 with a router table size S=4 is shown in FIG. 2. In this Figure fourGT connections are represented by the data streams s₁, s₂, s₃, and s₄.The number of time slots allocated for that data stream is shown inparentheses in FIG. 2.

The first output port (shown as upper port in FIG. 2) of the firstrouter R1 is unused and, consequently, the first column of the routingtable is empty. The second column of the routing matrix of the firstrouter R1 indicates that tokens from its inputs are written alternatelyon the second output port (shown as the lower port in FIG. 2).Consequently, both data streams s1 and s2 are routed with the desiredbandwidth without contention in the first router R1. In the secondrouter R2, the first output port (shown as the upper port in FIG. 2)receives tokens of the data streams s₁ and s₃. Since the tokens from thedata stream s₁ are routed in the time slots 0 and 2 in the first routerR1, they are routed at time slots 1 and 3 in the second router R2. Thisis seen by the two “1” in the first column of the router table of thesecond router R2. The single time slot required by the data stream S3 isscheduled in the time slot 2 of the first column. Similarly, asindicated by “1” in the second column of the router table of the secondrouter R2, tokens of the data stream s₂ are scheduled in the time slots0 and 2. Finally, the tokens of the data stream s₄ are scheduled in thetime slot 1.

It is not required that a GT token is available in every reserved timeslot.

When no GT packet arrives in a reserved time slot, a BE (best effort)packet can be sent over the claimed but unused time slot of the link.Best-effort (BE) services do not reserve any resource, and hence provideno guarantees, but use resources well because they are typicallydesigned for average-case scenarios instead of worst-case scenarios.

The number S of slots in the router table determines the granularity inwhich the total amount of bandwidth of a link can be divided. If Brepresents the amount of bandwidth per link, then a single connectioncan allocate bandwidth in chunks of B/S. Hence, increasing S, whichmeans increasing the number of slot-table entries of all routers,results in a finer granularity. However, a bigger size of the routertable results in higher costs of the router in terms of silicon area.Current estimations show that the router table can take as much as 50%of the total router silicon area A large router table has also anoperational disadvantage. Namely, for the high and medium bandwidthconnections a large number of slots must be programmed. This isexpensive in terms of the connection setup and teardown time.

FIG. 3 shows as an example a combination of two 2×2-routers R1 and R2connected in series, wherein the two 2×2-routers are indicated by R1 andR2, and the network terminals are identified by t₁ (i=1, 2, . . . , 6).

Assume that the first router R1 receives BE packets via terminal t₁,which are all destined to the terminal t₅ and that the bandwidth ofthese packets require 10% of the capacity of a link. Similarly, packetsgo from the terminal t₂ to the terminal t₆ and require only 1% of thelink capacity. The second router R2 receives a GT data stream via theterminal t₄ which is destined to the terminal t₆. The GT data streamclaims and uses 99% of the bandwidth and thus occupies the output linkfrom output port b of the router R2 to the terminal t₆ for 99% of time.So, the BE stream sharing port b can send a flit only in the remaining1% link capacity, and every time OT data arrives for port b thetransmission of the BE packet over port b is pre-empted.

This can cause long latencies for the packets of the 1% BE data stream,wherein latency is defined as the duration a packet is transported overthe network. It also causes the link between the routers R1 and R2 to beoccupied almost continuously by the 1% BE stream because flits ofdifferent packets are not interleaved. Thus, BE packets of the 10% datastream obtain less than 10% of the rate of the link. This means that inthe example of FIG. 3 the link between the routers R1 and R2 has autilization that is even below 11% of its theoretical capacity.

In order to overcome this problem there are basically three approaches:(1.) using virtual cut-through routing rather than a so-called wormholerouting, (2.) performing GT communication in relatively large blocks ofdata and large periods of no data, and (3.) using a GT service for the1% BE stream.

The first approach guarantees that a complete packet will be accepted inthe next router such that the incoming link of the next router does notblock. However, this is at the cost of extra memory.

The second approach ensures that flit pre-emption rarely occurs; Whenthe 99% of GT data is grouped in blocks of 10 time units, then thisbandwidth is obtained by alternative sending 99 blocks of data followedby 10 time units nothing. When the packet size of the BE data stream issmall compared to such 10 time units, a complete packet of the 1% BEdata stream is sent in the 10 time units and the link between therouters R1 and R2 can be used by the 10% BE data stream immediatelyafter the packet has been sent. While the first approach suffers fromadditional memory requirements in the router, this second approachsuffers from additional latency in the BE data stream.

In the third approach, a GT service is used to realize the connectionbetween the terminals t₂ and t₆. Consequently, the relatively lowbandwidth stream is scheduled at specific moments in time by means ofreserving 1 out of every 100 slots in the routing table. This requiresthe slot table to have a size of at least 100 entries. Since a GTservice results in a circuit-switched connection during the reservedperiod over time, the connection uses at most 1% of the link capacitybetween the routers R1 and R2. The remaining link capacity is availablefor the 10% BE stream.

The third approach requires a provision for efficiently storing a set ofconnections with both low and high bandwidth requirement. This isachieved by means of a layered reservation table. Given the substantialamount of area overhead consumed by the reservation table, it isstructured into L layers: T=(T₁, . . . , T_(L)). The table of layer l=1,. . . , L has a size of S₁ rows and a weight of w_(l)≧1. The weightspecifies the amount of bandwidth a slot in the correspondingreservation table represents in proportion to the weight of the otherlayers. This is realized by constructing a combined schedule of the Ltables, in which per period the tables T_(l), l=1, . . . , L are cycledw, times respectively. Hence the effective slot cycle period S_(e)becomesLS _(e) =Σw _(l) ·S _(l)l=1  (1)

and this at the cost of much less physical reservation table entriesLS=ΣS_(i)l=1  (2)

From equation (1) it follows that a slot at layer l corresponds with afraction w_(l)/S_(e) of the total link bandwidth B.

Such a router architecture including multi-layer router table isschematically shown in FIG. 4.

FIG. 5 shows the filling of the router tables for the situation asillustrated in FIG. 3 according to the multi-layer approach. Here, twolayers are required. One stream is a best-effort stream, which isdenoted by be, and two other streams are guaranteed-throughput These aredenoted by gt₁ and gt₂. The router table of each router, which schedulesboth streams, is divided in two layers, each having a different weight.The first layer 1 has a weight of 1 and supports gt₂. The second layer 2has a weight of 99 and supports gt₁. The matrices T1 ₁ and T2 ₁ definetwo sub-tables associated with the first layer 1 for the routers R1 andR2 respectively. The matrices T1 ₂ and T2 ₂ give the reservations forthe second layer 2. Consequently, a reservation of a slot in the secondlayer 2 requires 99 times more bandwidth allocation than a reservationof a slot in the first layer 1. As a result of the two-layer approach,the total number of slot entries S does not need to be larger than 3 forthis case.

The way in which the entries of the various tables are enumerateddepends on the latency requirements through the network and if it iswanted to spend extra costs in the terms of independent buffering perlayer.

The following description deals with two buffer options. In both casesswitching from one layer to another is assumed to be done synchronouslyfor all routers in the network.

Since the tables of the various layers are interleaved in time, thelayer controller of the router will, sooner or later, interrupt theenumeration of the table of one layer to continue with one of the otherlayers. If a first-in-first-out (FIFO) buffer policy is employed perinput, the FIFOs should not contain data that belongs to the level whenthe controller switches to another layer, otherwise data get messed up.It is not trivial to find such a point in the tables of all routers fora specific layer, because in general many paths through the network dooverlap each other in time. A natural point where a clean switch to adifferent layer can be performed without intersecting paths could beafter the last entry of the table. But in case of a circular schedulesuch a point does not exit at all. Namely, a circular schedule allows todivide a path through the routers in two pieces; the first part usesslots at the end of the table, the second part uses slots at thebeginning of the table. In other words, a path can be wrapped over theboundary of the table. In practice, a schedule with valid interruptionpoints for the “single FIFO per input approach” can result in adeterioration of the link utilization.

A more elegant buffer approach stores the incoming flits in a FIFO perlevel as depicted in FIG. 6 in conjunction with FIG. 4. As shown in FIG.4, a plurality of buffers Q is provided, wherein each input i₁ to i_(N)is coupled to such a buffer Q. In FIG. 6, the construction of such abuffer Q is schematically shown. In this concept, the various levels ofthe TDMA schedule use different queues, as such becoming logicallyindependent. Hence, reservation tables are allowed to be circular andswitching between the layers is possible at any moment in time.

It is to be noted that the latency through the network is not the samefor the two buffering strategies.

For reasons of convenience, the ratio between the high and low bandwidthconnections and the number of connections are kept small, respectively 1to 99 and 3. In practice however, the ratio and the number ofconnections can be much larger.

The advantage of a multi-level slot table is shown as follows. Forreasons of simplicity, suppose a network-on-chip consisting of just onerouter according to FIG. 4. Furthermore, let us focus on the guaranteedthroughput connections that flow through one particular output port.Suppose there are 60 GT streams through this output. The bandwidthrequirements of these streams is as follows: 50 GT-streams of 1 Mb/s and10 GT-streams of 1 Gb/s. Hence, the total aggregated bandwidth is atleast 10.05 Gb/s.

Three examples A, B and C of the slot-table, which differ in the numberof layers and the number of slot-table entries, will be discussed asfollows.

Example A makes use of one slot-table consisting of 10050 slots. Let thebandwidth of a single link be 10.05 Gb/s such that the bandwidth perslot becomes 1/10050×10.05 Gb/s=1 Mb/s. Now the 50 GT-streams of 1 Mb/sneed to reserve 1 slot each and the 10 GT-streams of 1 Gb/s need toreserve 1000 slots each.

Example B again makes use of a single layered slot-table but nowconsisting of just 250 slot entries, This reduced number of slot entriessaves a significant amount of costs. The optimal distribution of the 256slots over the 60 streams is as follows: the 50 streams of 1 Mb/s useone slot each, the 10 streams of 1 Gb/s use the remaining slots whichmeans 20 each. Now, to fulfil the bandwidth requirement of all streamsthe bandwidth of the link must be 250/20×1 Gb/s=12.5 Mb/s. Consequently,the bandwidth per slot is 50 Mb/s. One can see that this realization hasdisadvantages; firstly, it requires links with 25% more bandwidth thanExample A and secondly, this extra bandwidth is not available for otherconnections since the bandwidth granularity of 50 Mb/s does not allowso.

Example C makes use of a two layer slot-table. The first layer of theslot-table consists of 50 entries with a bandwidth per slot of 1 Mb/s.The second layer of the slot-table consists of 10 entries, where thebandwidth of each slot is 1 Gb/s. Consequently the weights, w_(l), ofthe subsequent layers is 1 and 1000. This realization requires thebandwidth of the link to be 10.05 GB/s just like in example A, howevernow we need only 60 slot table entries in total which is just 0.6% ofthe number in example A.

Although the invention is described above with reference to examplesshown in the attached drawings, it is apparent that the invention is notrestricted to it, but can vary in many ways within the scope disclosedin the attached claims.

1. A router, comprising a plurality of input means (i₁, . . . , i_(N)),at least one output means (o₁, . . . , o_(M)), switching means forswitching between said input means (i₁, . . . , i_(N)) and said outputmeans (o₁, . . . , o_(M)) and for connecting a selected input means tooutput means during a predetermined time slot, and a router table meansfor controlling said switching means, said router table means includinginstructions which input means be connected to output means for apredetermined time slot, characterized in that said router table meansis divided into a plurality of tables (T₁) (l=1, . . . , L), each tablehaving a weight (w_(l)≧1) which specifies the amount of bandwidth perreservation in one table in relation to a reservation in the othertable(s).
 2. The router according to claim 1, wherein the router tablemeans is divided into a plurality of hierarchical levels and each tableis allocated to a certain hierarchical level.
 3. The router according toclaim 1, wherein the weights of said tables are programmable.
 4. Therouter according to claim 1, wherein each table (T₁) includes a number(S₁) of rows.
 5. The router according to claim 1, wherein perpredetermined time period the tables (T₁) are cycled a number (w_(l)) oftimes corresponding to the respective weight (w_(l)>1).
 6. The routeraccording to claim 4, wherein the effective slot cycled period (S_(e))isLS _(e) =Σw _(l) ·S _(l)l=1
 7. The router according to claim 1, wherein the way in which entriesof the tables (T₁) are enumerated depends on latency requirementsthrough a network of which the router is being a part.
 8. The routeraccording to claim 1, comprising a plurality of buffer means (Q), eachconnected between an input means (i₁, . . . , i_(N)) and the switchingmeans, respectively, wherein each buffer means (Q) comprises a pluralityof buffer portions (1, . . . , L) corresponding to the plurality oftables (T₁), each buffer portion being allocated to a table,respectively, wherein the router table means is provided for controllingthe buffer portions in accordance with said tables.
 9. The routeraccording to claim 8, wherein said buffer means (Q) is afirst-in-first-out buffer means.